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Abstract 

Non-long terminal repeat retroelements continue to impact the human genome through cis-activity of long interspersed 
element-1 (LINE-1 or LI) and trans-mobilization of Alu. Current activity is dominated by modern subfamilies of these elements, 
leaving behind an evolutionary graveyard of extinct Alu and LI subfamilies. Because Alu is a nonautonomous element that relies 
on LI to retrotranspose, there is the possibility that competition between these elements has driven selection and antagonistic 
coevolution between Alu and LI. Through analysis of synonymous versus nonsynonymous codon evolution across LI subfamilies, 
we find that the C-terminal ORF2 cys domain experienced a dramatic increase in amino acid substitution rate in the transition 
from LI PAS to L1PA4 subfamilies. This observation coincides with the previously reported rapid evolution of ORF1 during the 
same transition period. Ancestral Alu sequences have been previously reconstructed, as their short size and ubiquity have made it 
relatively easy to retrieve consensus sequences from the human genome. In contrast, creating constructs of extinct LI copies is a 
more laborious task. Here, we report our efforts to recreate and evaluate the retrotransposition capabilities of two ancestral LI 
elements, L1PA4 and LI PAS that were active '^IS and ~40Ma, respectively. Relative to the modern L1PA1 subfamily, we find 
that both elements are similarly active in a cell culture retrotransposition assay in HeLa, and both are able to efficiently 
trans-mobilize Alu elements from several subfamilies. Although we observe some variation in Alu subfamily retrotransposition 
efficiency, any coevolution that may have occurred between LINEs and SINEs is not evident from these data. Population 
dynamics and stochastic variation in the number of active source elements likely play an important role in individual LINE or 
SINE subfamily amplification. If coevolution also contributes to changing retrotransposition rates and the progression of 
subfamilies, cell factors are likely to play an important mediating role in changing LINE-SINE interactions over evolutionary time. 
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Introduction 

Long interspersed element-1 (LINE-1 or LI) is the dominant 
human non-long terminal repeat autonomous retroelement 
and has been active in mammalian genomes for more than 
170 My (Smit 1999). The human genome has been 
significantly impacted by the activity of LI, through 
both self-mobilization and trans-mobilization of the SINE 
Alu. Together, LI and Alu repeat sequences account for at 
least a third of the human genome (Lander et al. 2001), 
and more recent analyses suggest this may be a gross 
underestimation (de Koning et al. 2011). Following 
retrotransposition, the active Alu and LI copies lose function- 
ality as they accumulate mutations at a neutral rate, leaving 
older copies with higher sequence degradation than newer 
copies. Phylogenetic analysis of LI families has shown that LI 
subfamilies follow a linear pattern, whereby a single LI lineage 
proliferates, differentiates, and is eventually replaced by a new 
dominant subfamily (Deininger et al. 1992; Smit et al. 1995; 
Boissinot and Furano 2001). Alu subfamilies follow a similar 
pattern with a progression of dominant subfamilies over the 
course of primate evolution (Shen et al. 1991). 



LI elements contain two open reading frames (ORF1 and 
ORF2) that code for proteins essential for LI retrotransposi- 
tion. Trans-mobilization of short interspersed elements 
(SINEs), by contrast, is only ORF2 dependent (Dewannieux 
et al. 2003; Wallace et al. 2008). Because Alu requires LI to 
retrotranspose, it is conceivable that competition between 
these retroelements has triggered antagonistic coevolution 
between LINEs and SINEs and altered their interactions over 
evolutionary time. In some cases, one or both elements could 
be driven to extinction within a lineage. For example, 
coextinction of L2 and its proposed SINE partner, MIR, has 
been observed in humans (Lander et al. 2001). In another 
example, sigmodontine rodents lost both functional LI and 
B1 SINEs (Rinehart et al. 2005). In this case, B1 silencing ap- 
pears to have preceded LI extinction. Thus, the mere presence 
of an active LI is not necessarily sufficient to support SINE 
activity, suggesting that host factors and/or changes within 
the SINE itself could affect retrotranspositional capability. 

Several studies have used retroelement insertion sequence 
divergence and/or presence/absence data from primate 
genomes (Shen et al. 1991; Ohshima et al. 2003; Khan et al. 
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2006; Bennett et al. 2008) to evaluate the temporal prolifer- 
ation of mammalian retroelements. We created a simplified 
schematic of some of these findings in figure 1 by showing the 
amplification history of the major Alu subfamilies relative to 
that of LI. LI went through a long period of high amplifica- 
tion, followed by a steady decline in activity from approxi- 
mately 65 Ma to the current relatively low rate. Evaluation 
of Alu amplification reveals that different Alu subfamilies 
experienced peak activity during discrete periods, with 
waves of Alu subfamily activity occurring at '^l 5-20 Ma, 
^^40-50 Ma, and ^^55 Ma with the proliferation of Alu Y, S, 
and J subfamilies, respectively. The relatively less abundant 
young Alu Ya and Yb subfamilies are currently active and 
most likely account for all human-specific insertions (Batzer 
and Deininger 2002; Hedges et al. 2004). The decline in LI 
activity roughly coincides with the initial emergence of Alu 
elements (^^65 Ma). Ohshima et al. (2003) compared the 
evolutionary proliferation of Alu and LI repeats in humans, 
as well as processed pseudogenes, and showed that peak 
Alu and pseudogene amplification occurred simultaneously 
at approximately 40-50 Ma. This observation led to the sug- 
gestion that the dominant ancestral LI subfamilies of the 
era might have mobilized RNAs in trans at accelerated rates 
relative to other LI subfamilies (Ohshima et al. 2003). During 
this peak period of amplification, the Alu J and Alu S sub- 
families were actively generating the majority (^^80% of the 
1.1 million) of Alu copies in the human genome. Interestingly, 
the period of elevated Alu Y subfamily amplification 
(15-20 Ma) also coincides with the emergence of another 
LI -dependent nonautonomous element, SINE/VNTR/Alu 
(SVA), in the hominid lineage (Wang et al. 2005). 



Here, we present data that demonstrate the reconstruc- 
tion of functional full-length LI elements from two extinct 
human LI subfamilies that were active during periods of 
increased Alu amplification or rapid LI protein evolution. 
We show that they are retrocompetent in an ex vivo tissue 
culture assay, both for LI cis-mobilization and trans- 
mobilization of Alu. We find limited evidence of differential 
associations between Alu and LI subfamilies, suggesting that 
other factors are likely the primary mediators of their chan- 
ging interactions over evolutionary time. 

Materials and Methods 

Constructs 

A schematic of the basic Alu- and LI -tagged vectors is shown 
in figure 2. The "SINE"-neo^^^ constructs (pAluY-neo^^^, 
pAluSgl-neo^^^, pAluSx-neo^^^, and pAluJo-neo^^^) were cre- 
ated by substituting the Ya5 Alu element from pAluYa5- 
neo^^^ (Kroutter et al. 2009) with the different Alu subfamily 
consensus sequences using a BamH\ site (5^ of 7SL promoter 
enhancer sequence) and the introduced AatW site (fig. 2). The 
AluSx consensus sequence (previously known as AluPS) dif- 
fers at position 225 (G instead of C) (Aleman et al. 2000). 

JM101/L1.3, referred to as "wild type" LI, contains a 
full-length copy of the LI .3 element tagged with the mneol 
indicator cassette cloned in pCEP4 (Invitrogen) (Dombroski 
et al. 1993; Sassaman et al. 1997). 

Reconstruction of Extinct LI Elements 

The codon-optimized LI PA4 and PA8 and wild-type L1PA8 

ORF1 and ORF2 consensus sequences were synthesized by 



L1PA: 12345 67 8 8a 10 11 13B12 13A 



Alu subfamily 
copy number 

— J 160,000 

— S 550,000 

— Y 125,000 
Ya&Yb -5,000 



MYA: 5 15 20 30 40 SO 60 ' ^ ' 

Pre- Alu period 

Fig. 1. Age distribution of Alu and L1 subfamilies. This schematic depicts the rate of insertion for Alu and L1 elements over evolutionary time. The 
relative insertion frequency for L1 (all subfamilies combined) is represented by the gray dotted line and is set at a scale 4x larger relative to Alu to 
emphasize the changing rate of amplification coinciding with the emergence of the indicated Alu subfamilies. The corresponding active subfamilies 
from the L1PA family at each time period are indicated above. Individual Alu subfamilies are shown with their total copy number indicated in the inset. 
The two timeframes that include peak activity periods for L1PA4 (^^18 Ma) and L1PA8 (^^40 Ma) are indicated by shaded boxes. Data were adapted 
from Shen et al. (1991), Ohshima et al. (2003), Khan et al. (2006), and Bennett et al. (2008). 




89 



Wagstaff et al. • doi:10.l093/molbev/mss202 



MBE 



LINE-1: 



CMV ORF1 0RF2 



c>MJmyc tag 
CMV 0RF1 



CMV 



(notag) 



2 njD 
(tag) 



ORF2 



1 



Alu: 



7SL Alu reoTET 



Fig. 2. Schematic of the L1 and Alu constructs. A representation of the 
basic components of the constructs is shown. The L1 constructs contain 
the codon-optimized ORF1 and ORF2 separated by the L1rp inter-ORF 
sequence (Wagstaff et al. 2011) or the wild-type sequence of the LI PAS 
ORFs. Two types of LI constructs were built, which differ only at their 
3' -end: 1, untagged containing the SV40 polyadenylation signal (pA) 
and 2, tagged with the neomycin indicator cassette designed for retro- 
transposition assays (mneol). The individual ORF1 (blue) and ORF2 
(purple) sequences were cloned downstream of the CMV promoter 
of the expression vector pBudCE4.1 (Invitrogen). The ORF1 was 
cloned, so that the protein will contain a myc-his tag (myc) at the 
carboxy terminus. The Alu subfamily (yellow) constructs are tagged 
with the neomycin indicator cassette (neo^^^) designed for retrotrans- 
position assays. RNA transcription is performed by the CMV promoter 
(LI) or the internal pol III promoter of the Alu enhanced by the up- 
stream sequence of the 7SL gene (shown as gray arrows). The retro- 
transposition indicator cassettes (mneo\ or neo^^^) contain an inverted 
neomycin resistance gene disrupted by an intron that will splice only 
from a transcript generated by the CMV or Alu promoter. The neo^^^ 
contains a modification of the ribozyme from Tetmhymena (repre- 
sented as a looped line) that functions as a self-splicing intron 
(Esnault et al. 2002). Only retrotransposed copies of the spliced RNA 
will confer G418 resistance. Some of the unique restriction sites used in 
the construction of the vectors are shown. 



Blue Heron Biotechnology, Inc (Bothell, WA) or GenScript. 
Codon optimization of the sequences was performed using 
Primo Optimum 3.4 (http://www.changbioscience.com/ 
primo/primoo.html). Note: the LI PAS constructs contain 
the corrected version of the consensus sequence (table 2). 
All bicistronic LI constructs were built using pBS- 
LIPAIch'^'^^o as base (Wagstaff et al. 2011) by substituting 
the LI PA1 ORF1 and ORF2 coding sequences with 
the corresponding synthesized LI sequences. Different 
cassettes were added at the 3^ -end of each LI subfamily 
construct (fig. 2): 

pBS'LIPAIcH^neo, pBS-LI PA4cH^neo, and pBS-LlPA 
ScHmneo, referred to as the "tagged" vectors, contain the 
codon-optimized ORF1 and ORF2 of the consensus se- 
quence of each subfamily and the mneo\ cassette including 
the SV40 polyadenylation signal (pA) from JM101/L1.3 
(Dombroski et al. 1993). 

pBS-LI PASwT'^neo contains the corrected version of 
the "wild type" consensus sequence of LI PAS (Khan 



et al. 2006), with the 11 modified codons as described 
in table 2. 

pBS-LIPAIcHnotag, pBS-LI PA4cHnotag, and pBS-LIPAScH 
notag, referred to as the "no tag" constructs, contain an 
SV40 pA at the 3^ -end that was introduced into the 
EcoRI-Fsel sites (fig. 2). 

The individual ORFs of the different LI elements were all 
cloned into the expression vector pBudCE4.1 (Invitrogen), 
under control of the cytomegalovirus (CMV) promoter: 

pBudORF2cH (Wagstaff et al. 2011) and pBudORFIopt 
(Wallace et al. 200S) were created using the codon-opti- 
mized LIrp as a source for the ORF2 and ORF1 coding 
sequences. These constructs are used for the expression 
of the L1PA1 ORF1 and ORF2. 

pBudORF1PA1cH-myc/ pBudORF1PA4cH-myc/ and pBudORFI 
PAScH-myc Were generated by cloning the polymerase chain 
reaction (PCR)-amplified codon-optimized consensus se- 
quences of each ORF into the H/ndlll-BamHI sites of the 
pBudCE4.1 vector in a manner that removes the stop 
codon of the ORF1, so the expressed protein will contain 
the myc-his tag at the carboxy terminus (fig. 2). The fol- 
lowing primers were used in the amplification of ORF1: 5-A 
GACCCAAGCTTAGCTAAAACCACAAAGATC-3' and 5-T 
GTTCGGATCCGATCTTGGTGTGCTTCTGCAGGGG-3' for 
ORF1PAS or 5-TGTTCGGATCCCATCTTGGCGTGGTTTT 
GCAGGGG-3' for ORF1 PA1 and ORF1PA4. 

pBudORF2PA4cH and pBudORF2PAScH were generated by 
cloning the codon-optimized consensus sequences of 
each ORF that retain the stop codon into the H/ndlll- 
BamH\ sites of the pBudCE4.1 vector (fig. 2). 

Plasmids were independently purified in triplicate by either 
alkaline lysis and twice purified by cesium chloride buoyant 
density centrifugation or by using the QIAGEN Plasmid Plus 
Maxi kit, following the manufacturer's protocol. DNA quality 
was also evaluated by the visual assessment of ethidium 
bromide-stained agarose gel-electrophoresed aliquots to 
evaluate purity and quality. All new constructs were sequence 
verified. 

Analysis of Nonsynonymous and Synonymous 
Substitutions across L1 Subfamilies 
Consensus sequences for the analysis were from Khan et al. 
(2006) but with the 11 modified codons of LI PAS ORF2 as 
detailed in table 2. Domain breakpoints for the ORF2 protein 
were determined as follows, with residue numbers corres- 
ponding to L1PA1 ORF2p: the N terminus endonuclease 
included residues 1-239, in accordance with the well- 
established domain (Feng et al. 1996; Cost et al. 2002; 
Weichenrieder et al. 2004); the reverse transcriptase (RT) 
domain included residues 511-773, following the boundaries 
as defined by the Conserved Domain Database (CDD v3.05- 
425S9 PSSMs [Marchler-Bauer et al. 2005]); and the remain- 
ing "inter endo RT" (residues 240-510) and "cys" (residues 
774-1,275) domains were simply the remaining regions of 
ORF2p 5^ of the RT and 3' of the RT, respectively. 
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Table 1. Analysis of Nonsynonymous (K^) versus Synonymous (/CJ Substitutions of the Consensus Sequence of the Individual ORF2 Domains of 
the L1PA Family. 



LI Pair KJKs Analyses on LI ORF Domains (Nucleotide Positions from Consensus Sequences)^ 







ORF1 Full 


ORF2 Full 


ORF2 Endo 


ORF2 Inter 


ORF2 RT 


ORF2 Cys 






Length 


Length 


Domain 


EndO'RT Domain 


Domain 


Domain 






(31-1,053) 


(1,120-4,944) 


(1,120-1,836) 


(1,837-2,649) 


(2,650-3,438) 


(3,439-4,944) 


L1PA2^ 


L1PA1 


0.2749 


0.1951 


0.2927 


0.1638 


0.0558 


0.2371 


L1PA3^ 


L1PA2 


0.2738 


0.2473 


0.8786 


0.1806 


0.1861 


0.1774 


L1PA4^ 


L1PA3 


1.4588 


0.0846 


0.0486 


0.0675 


0.0000 


0.1906 


LIPASE 


L1PA4 


2.4666 


0.2551 


0.0728 


0.0000 


0.0000 


1.5992 


L1PA6^ 


LI PAIS 


0.391 1 


0.1443 


0.2429 


0.1201 


0.0866 


0.1775 


L1PA7^ 


L1PA6 


0.6115 


0.1914 


0.3219 


0.2740 


0.0869 


0.1635 


LIPASE 


L1PA7 


0.4187 


0.2245 


0.2873 


0.5556 


0.2829 


0.0884 


L1PA8A- 


^ LI PAS 


0.2256 


0.2068 


0.3998 


0.1974 


0.1003 


0.2125 


LIPAIO^ L1PA8A 


0.3260 


0.1272 


0.2903 


0.1439 


0.0128 


0.1249 


L1PA11 L1PA10 


0.4537 


0.1708 


0.0862 


0.3012 


0.0518 


0.2127 


L1PA13B 


LlPAll 


0.3500 


0.1936 


0.4137 


0.1849 


0.0337 


0.2428 


L1PA12- 


>L1PA13B 


0.4300 


0.1813 


0.2669 


0.2266 


0.0359 


0.2170 


L1PA13- 


> LlPAll 


0.2204 


0.2530 


0.3845 


0.2688 


0.0642 


0.2994 



Note. — Numbers in bold indicate an increase in amino acid substitution rate. 
^Consensus sequences from Khan et al. (2006). 



Table 2. Analysis of Codon Changes Involved in the Modified L1PA8 Consensus Sequence. 



L1PA8^ 


47" 


101 


104 


347 


375 


716 


755 


777 


838 


918 


1092 


Copy 1 


ACA (T) 


AAC (K) 


ATG (M) 


ATG (M) 




TAA (0 


ACC (S) 


GTG (V) 


ACA (T) 


GAC (D) 


ATG (M) 


Copy 2 


ATG (M) 


AAC (K) 


ATG (M) 


CCA (P) 


ACA (R) 


GAA (G) 


ACT (S) 


GTG (V) 


ACA (T) 


AAC (N) 


GTG (V) 


Copy 3 


ACC (T) 


AAC (K) 


ATG (M) 


TTG (L) 


ACA (R) 


CCA (P) 


ACT (S) 


ATG (M) 


AAA (K) 


CAC (H) 


ATG (M) 


Copy 4 


ACC (T) 


AAC (K) 


ATG (M) 


CTG (L) 


ACA (R) 


CAA (Q) 


ACT (S) 


TTG (L) 


ACA (T) 


AAA (K) 


ATG (M) 


Copy 5 


ATG (M) 


AAC (K) 


ATG (M) 


CTG (L) 


ACA (T) 


CAA (Q) 


ACT (S) 


GTG (V) 


ACA (T) 


GAC (D) 


ATG (M) 


Copy 6 


ACA (T) 


AAC (K) 


ATG (M) 


CTG (L) 


AGA (R) 


CCA (P) 


ACT (S) 


ATG (M) 


ACA (T) 


AAC (N) 


GTG (V) 


Copy 7 


ACA (T) 


AAC (K) 


ATG (M) 


TTG (L) 


ACA (R) 


CAA (Q) 


AAT (N) 


ATG (M) 


GCA (A) 


TAC (Y) 


ATG (M) 


Copy 8 


ACA (T) 


AAC (K) 


ATG (M) 


CTA (L) 


AGA (R) 


CCA (P) 


ACT (S) 


GTG (V) 


ACA (T) 


GAC (D) 


ATT (1) 


Copy 9 


ACA (T) 


AAC (K) 


ATG (M) 


CTG (L) 


AGA (R) 


CAA (Q) 


ACC (S) 


GTG (V) 


CCA (P) 


AAC (N) 


ATG (M) 


Copy 10 


ACC (T) 


AAC (K) 


ATG (M) 


CTG (L) 


AGA (R) 


CAA (Q) 


GGT (C) 


GTG (V) 


ACA (T) 


GAC (D) 


GTG (V) 


Copy 11 


ACA (T) 


AAC (K) 


ATG (M) 


TTG (L) 




CAC (H) 


ACT (S) 


GTG (V) 


ATA (1) 


GAC (D) 


ATG (M) 


Copy 12 


ACC (T) 


AAC (K) 


ATG (M) 


CTG (L) 


AGA (R) 


CAA (Q) 


ACT (S) 


GTG (V) 


ACA (T) 


AAC (N) 


ATG (M) 


Copy 13 


ATG (M) 


AAC (K) 


ATG (M) 


CTG (L) 




CAA (Q) 


ACT (S) 


CCA (A) 


TCA (S) 


GAC (D) 


ATT (1) 


Original'' 


ATG (M) 


AAT (N) 


ACG (T) 


GTG (V) 


GGA (G) 


CCA (P) 


AAT (N) 


ATG (M) 


GCA (A) 


AAC (N) 


GTG (V) 


Corrected'' 


ACC (T) 


AAC (K) 


ATG (M) 


CTG (L) 


AGA (R) 


CAA (Q) 


ACT (S) 


GTG (V) 


ACA (T) 


GAC (D) 


ATG (M) 


L1PA7 


T 


K 


M 


L 


R 


Q 


S 


V 


T 


D 


M 


L1PA8A 


T 


K 


1 


L 


R 


Q 


N 


V 


T 


D 


M 



NoTE.-Residues matching the corrected sequence are in bold and highlighted gray. 

^Amino acid sequence numbers using the ORF2 LIrp sequence as reference. Codons are indicated with amino acid in parentheses. Absent codons (deletions) are designated by 
dashes. 

''The original consensus, corrected derived subfamily, and ancestral subfamily are shown at the bottom for the indicated codon positions. 



Nonsynonymous and synonymous substitution rates be- 
tween temporally adjacent LI subfamilies were computed 
using DnaSP v5 (Librado and Rozas 2009). 

LINE and SINE Assays 

Transient LI or Alu retrotransposition assays were performed 
as described previously with some minor modifications 
(Kroutter et al. 2009). Briefly, HeLa cells (ATCC CCL2) were 
seeded in T25 or T75 flasks at a density of 2 x 10^ or 5 x 10^ 



cells, respectively. Transient transfections were performed 
the following day using Lipofectamine Plus (InVitrogen) fol- 
lowing the manufacturer's protocol. LI retrotransposition 
was assayed in T25 flasks by transfecting cells with 0.4 |ig of 
the LI constructs. To evaluate Alu retrotransposition, cells 
were seeded in six-well plates at a density of 1.0 x 10^ cells 
per well. The cells were transfected with 1 |ig of the ORF2 
expression vector or with 1 |ig of the untagged LI subfamily 
construct and varying amounts of the tagged Alu subfamily 
constructs (0.1-1 |ig) as indicated. Empty vector was used in 
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the mix to equalize the amount of total DNA used in each 
transfection reaction. The following day, the cells were treated 
with the appropriate selection media containing 400|ig/ml 
Geneticin/G418 (Fisher Scientific). After 14 days, cells 
were fixed and stained for 30min with crystal violet (0.2% 
crystal violet in 5% acetic acid and 2.5% isopropanol). 
For transfections using the RT inhibitor 2^3^'didehydr0'3^' 
deoxy-thymidine (d4t; Sigma-Aldrich), a final concentration 
of 50 |iM d4t was added to the media at the time of trans- 
fection and maintained with subsequent media changes for a 
period of 7 days. 

PGR Evaluation of LI Inserts 

Golonies of G418 resistant cells were pooled, and DNA was 
extracted using the DNA-Easy kit (Qiagen) following the 
manufacturer's recommended protocol. PGR was performed 
for 35 cycles at 58°G annealing temperature with a 1 min 
extension using Taq polymerase with the following primers 
designed to flank the intron disrupting the neomycin gene 
(fig. 4B): RNeO'ExonI: 5''ATGGGATGGGGGATTGAAGAAGA 
TG'3^ and FNeO'Exon2: 5^'GGAAGGTGAGATGAGAGGAGAT 
GG'3^ Amplification products containing the unspliced 
intron are expected to be 1,233 bp, whereas spliced products 
with the intron removed are 330 bp. 

Northern Blot Analysis 

Gells were harvested 24 h post-transfection. RNA extraction 
and poly(A) selection were performed as described previously 
(Perepelitsa-Belancio and Deininger 2003). The polyadeny- 
lated RNA species were evaluated in a 2% (Alu and ORF1 
constructs) or a 1% (LI and ORF2 constructs) agarose- 
formaldehyde gel and transferred to a Hybond-N nylon mem- 
brane (Amersham Biosciences). The RNA was UV cross-linked 
to the membrane using ultraviolet (UV) light (GS Gene linker, 
BioRad). The membrane was preincubated in hybridization 
solution: 30% formamide, IX Denhardt's solution, 1% SDS, 
1M NaGI, 100|ig/ml salmon sperm DNA, and 100|ig/ml 
yeast t-RNA at 60°G for at least 3 h. The DNA templates 
containing the T7 promoter for riboprobe generation were 
generated by PGR amplification. For the 3^-region of the neo- 
mycin gene used primers: T7neo(-): 5^-TAATAGGAGTGAGTA 
TAAGGAGGAGGGAGGG-3^ and Neo northern(+): 5^-GAAG 
AAGTGGTGAAGAAGG-3'; for the ORF2 used primers: 
T70RF2ch1 80 5'-TAATAGGAGTGAGTATAGGGTGGATGGG 
GTTGATGTGG-3^ and F-ORF2ch180 5^-AAGATGATGGGGG 
GGATGTAGGA-3^; and for the myc-his tag 3^-region of the 
tagged ORF1 used primers: T7mychis: 5'-TAATAGGAGTGAG 
TATAGGGATGTGT-3' and F-mychis: 5'-TGGTGATGGTGAT 
GATGGATGTTGGG-3^ We used a commercially available 
construct to generate the riboprobe for (3-actin (Ambion). 
Riboprobes were generated by incorporating ^^P-GTP 
(Amersham Biosciences) label using the MAXIscript T7 kit 
(Ambion) following the manufacturer's recommended proto- 
col. The radiolabeled probes were purified by filtration 
through a NucAway Spin column (Ambion). Separate hybrid- 
izations were performed overnight with 4-12 x 10^ cpm/ml 
of each individual probe at 60° G. The membrane was washed 



twice at high stringency (0.1 x Ssline-sodium citrate [SSG], 
0.1% sodium dodecyl sulfate [SDS]) at 60° G before analysis 
using a Typhoon Phosphorimager (Amersham Biosciences) 
and the ImageQuant software. 

Western Blot Analysis 

Two to four T75 flasks of HeLa cells (4 x 10^/flask) were 
transiently transfected with 6 |ig of plasmid per T75. Gells 
were harvested 24 h post-transfection. Equal amount of pro- 
tein extracts were electrophoresed on 3-8% Tris-acetate gel 
(Invitrogen). Proteins were transferred to a nitrocellulose 
membrane using the iBIot gel transfer system using the manu- 
facturer's recommended settings (Invitrogen). Blots were 
blocked overnight in phosphate buffer saline (PBS) pH 7.4, 
0.05% Tween 20, 5% nonfat dry milk (Biorad) at 4°G. A mouse 
monoclonal anti-myc (clone 9E10, Upstate) was used to 
detect the myc-tagged ORFIp. Antibodies against (3-actin 
and secondary horse radish peroxidase (HRP)-conjugated 
antibodies were purchased from Santa Gruz Biotechnology 
Inc. The membrane was incubated for 1 h at room tempera- 
ture with the primary or secondary antibody diluted 1:500 
and 1:5,000 in PBS pH 7.4, 0.05% Tween 20, 3% nonfat dry milk 
(Biorad), respectively. Signals were detected using the 
SuperSignalWest Pico Ghemiluminescent Substrate (Pierce, 
Rockford, IL) and Amersham EGL hyperfilm (GE Healthcare). 

Results 

Selection Griteria for Reconstruction of Extinct L1 
Elements 

Two criteria were considered before selecting the particular 
LI subfamily members to reconstruct. First, we focused on the 
period of high Alu activity when competition with LI may 
have been intense. Amplification of the Alu J and S subfami- 
lies (fig. 1) contributed approximately 850,000 copies, ac- 
counting for the majority (^^80%) of the Alu elements 
currently present in the human genome (Shen et al. 1991). 
The dominant LI subfamilies that existed during the different 
periods of individual Alu subfamily activity range from 
L1PA13 to L1PA1 (fig. 1), with L1PA8 being active during 
the peak of Alu insertion and the expansion of the Alu S 
subfamilies approximately 40 Ma (Ohshima et al. 2003). 
Therefore, we selected LI PA8 as one of two ancestral elem- 
ents for reconstruction. 

Our second criterion was based on observations of rapid 
protein sequence evolution during LI subfamily progression. 
A previous LI subfamily study (Khan et al. 2006) evaluated 
the ratio between the fixation rates of nonsynonymous (Ka) 
and synonymous (Ks) mutations on the derived consensus 
sequences of the different LI subfamilies and determined 
that the coding sequences of ORF2 have remained relatively 
conserved across subfamilies. However, they show that 
ORF1 experienced a long spell of positive selection 
ranging from ~12 to 40 Ma, with particularly high protein 
evolution approximately 15-20 Ma during the transition 
from LI PAS as the dominant subfamily to L1PA3. We 
re-evaluated these data using the published consensus se- 
quences (Khan et al. 2006) but updated with changes to 
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the LI PAS consensus described in the next section, by imple- 
menting a similar /<a//<sanalysis (table 1) on ORF1 and ORF2, 
with particular focus on the different regions of the LI ORF2 
protein. Unlike the previous study, we subdivided the ORF2p 
into four distinct regions: the endonuclease domain (endo), 
the region between the endonuclease and RT (inter 
endO'RT), RT domain, and the carboxy terminus containing 
the "zinc-knuckle" cysteine-rich domain (cys). Our analysis 
confirmed the previous observations that ORF2p generally 
shows signs of purifying selection when using the full-length 
sequence for the analysis. However, in contrast to other re- 
gions and changes between other LI subfamilies, the cys 
domain experienced a notable increase in amino acid substi- 
tution rate at approximately 18-20 Ma, during the transition 
from the LI PAS to the L1PA4 subfamilies (table 1, bold font). 
Interestingly, this rapid protein evolution appears to have 
occurred during an evolutionary time frame that coincides 
with the ending of a long period of highly permissive LI 
trans-mobilization of Alu and processed pseudogenes and 
the emergence of the SVA retroelement (Ohshima et al. 
2003; Wang et al. 2005). The concurrence of ORF2p cys 
domain evolution with changes in ORFIp is noteworthy; 
but any relationship between the two observations can 
only be speculative at this time. Thus, on the basis of these 
data, we decided to reconstruct L1PA4 as our second selec- 
tion. Together with LI PAS, we have two ancestral LI elements 
that contain the ancestral (LI PAS) and derived (LI PA4) pro- 
tein sequences spanning this notable period of rapid evolu- 
tion and that coincided with the observed changes in Alu 
subfamily evolution (fig. 1). 

Construction of Extinct L1 Elements 
We used the published L1PA4 and LI PAS consensus 
(Khan et al. 2006) to generate the presumed ancestral se- 
quences for our reconstructed LI elements. This method is 
not ideal for typical gene trees where older substitutions tend 
to outnumber younger substitutions in samples of extant 
sequences. However, given the unique evolutionary dynamics 
of retroelements, LI gene trees resemble star phylogenies with 
a few active elements within a subfamily giving rise to nu- 
merous additional copies (Arndt et al. 2003). Thus, sampling 
biases generated by substitution timeframes should have 
a negligible effect on the assumption that ancestral LI sub- 
family sequences are likely to resemble the consensus 
sequence of human reference assembly genomic copies cor- 
rected for CpG mutations. Another concern is that most 
of the LI sequence data available for alignments consists 
of 5^-truncated elements, making it more difficult to gener- 
ate reliable consensus sequence for ORFIp and the 
N-terminus of the ORF2p. Thus, particular attention was 
given to these regions during the verification of the consensus 
sequences. 

LI elements generate limited amounts of full-length RNA 
due to internal splice sites (Belancio et al. 200S), internal pAs 
(Perepelitsa-Belancio and Deininger 2003), and overall 
A-richness (Han et al. 2004), making it difficult to quantita- 
tively differentiate LI subfamilies with respect to 



cis-retrotransposition rates and trans-mobilization of Alu. 
These factors could potentially also lead to the translation 
of differing amounts of ORFIp and ORF2p. We wished to 
specifically characterize any role(s) that protein sequence dif- 
ferences might have between subfamilies. Therefore, we 
codon optimized consensus sequences to reconstruct 
extinct LI elements with unchanged amino acid sequences 
but with strategic changes at synonymous codon positions to 
reduce transcriptional and translational variation between 
elements. Codon-optimized LI elements have previously 
been created with amino acid sequences identical to ac- 
tive modern human and rodent elements (Han and Boeke 
2004; Wagstaff et al. 2011). In these published cases, the syn- 
thetic Lis appear to be comparable to wild-type Lis in a 
cultured cell retrotransposition assay but have higher retro- 
transposition efficiencies when compared with the equiva- 
lent wild-type LI elements. For the design and synthesis of 
our synthetic L1PA4 and LI PAS full-length and ORF2 
alone constructs, we followed the same codon optimization 
and plasm id assembly strategy we previously used for the 
synthesis of L1PA1 (see Materials and Methods) (Wagstaff 
et al. 2011). To add further evidence of functionality, we 
also reconstructed a wild-type (nonoptimized) LI PAS con- 
struct for analysis and comparison alongside the synthetic 
version. 

The synthetic LI PAS and L1PA4 consensus sequences 
were cloned into constructs that would either support 
expression of the ORF2 protein, the expression of the 
full-length LI (untagged), or the expression of an LI 
with a neomycin cassette (tagged) that would allow evalu- 
ation of retrotransposition in a culture assay system 
(fig. 2). We initially tested the retrotransposition compe- 
tence of the ORF2 constructs by assaying their ability to 
support Alu retrotransposition in cultured HeLa cells and 
found that the consensus LI PAS ORF2 was unable to 
drive Alu retrotransposition. Given that Alu retrotranspo- 
sition is readily supported by both human and rodent 
LI ORF2 sources, including chimeric human-rodent 
ORF2s (Wagstaff et al. 2011), we decided to re-evaluate 
the LI PAS consensus sequence. Because current LI PAS 
human genome copies are often truncated and highly 
battered, we sought to determine whether manual editing 
of the sequence could correct errors that emerge from 
automated consensus building. 

To identify genomic copies of LI PAS, we used the pub- 
lished LI PAS consensus as a BLAT query (UCSC Genome 
Browser, hg19 Assembly: http://genome.ucsc.edu/cgi-bin/ 
hgBlat) and identified 23 genomic copies that were 
full-length or near-full-length elements annotated as LI PAS. 
To validate these LI copies as belonging to the LI PAS 
subfamily, we queried these 23 copies to Repeat Masker 
(http://www.repeatmasker.org/). Repeat Masker identified 
13 copies as LI PAS and 10 of the copies as belonging to 
subfamilies other than LI PAS. Thus, for our final LI PAS set, 
we only used the 13 Lis that Repeat Masker confirmed as 
LI PAS for our subsequent consensus analysis. 

The alignment of these 13 LI PAS sequences led to a mod- 
ified consensus sequence with 11 amino acid changes in 



93 



Wagstaff et al. • doi:10.l093/molbev/mss202 



MBE 



Table 3. Rationale for Changes to the L1PA8 Consensus Sequence. 



Position^ 


Consensus AA 


Modified AA 


Support for Choice of Modification 


47 


M 


T 


Most common residue + CpG site 


101 


N 


K 


Most common residue + polymorphic site 


104 


T 


M 


Most common residue + CpG site 


347 


V 


L 


Most common residue + polymorphic site 


375 


C 


R 


Most common residue + polymorphic site 


716 


P 


Q 


Most common residue + polymorphic site 


755 


N 


S 


Most common residue + polymorphic site 


777 


M 


V 


Most common residue + polymorphic site 


838 


A 


T 


Most common residue + CpG site 


918 


N 


D 


Most common residue + CpG site 


1092 


V 


M 


Most common residue + polymorphic site 


^Amino add sequence 


numbers using the ORF2 LI 


RP sequence as reference. 





the ORF2 sequence relative to the original consensus. These 
11 codon positions are shown for each of the 13 LI PAS 
sequences in table 2. In all cases, our modified consensus 
is supported by a plurality of the individual sequences. 
Table 3 lists the individual changes made to the modified 
consensus and the rationale for those changes. Because the 
CpG dinucleotides mutate at a rate that is approximately 10 
times faster than non-CpG positions as a result of the de- 
amination of S-methylcytosine (Bird 1980), we specifically 
searched for changes associated with CpGs. Four of the 1 1 
codons contain CpG correction errors, and the remaining 
codons were either polymorphic or supported by each of 
the individual sequences from our alignment. An example 
of an ambiguous amino acid in the ORF2p from LI PAS is 
shown in figure 3. There are several possible explanations for 
the differences between our modified consensus and the ori- 
ginal published LI PAS ORF2 consensus sequence: 1) we used 
different individual elements to construct the consensus se- 
quence, 2) uncertain alignments, particularly with respect to 
small deletions and adjacent nucleotides, and 3) ascertaining 
CpG sites. We had the additional benefit of closely scrutiniz- 
ing the differences between the modified and original con- 
sensus sequences. Comparison to the closest ancestral 
(L1PASA) and derived (L1PA7) subfamilies of LI PAS provides 
further support for the 1 1 codon modifications we made (last 
two rows of table 2). Before the corrections, 10 of the 11 
codons were not shared by either the ancestral or derived 
subfamilies. Following the modifications, 9 of the 11 codons 
match the corresponding codon for both of these subfamilies, 
whereas the remaining two codons match one related sub- 
family. Therefore, these changes are the most parsimonious 
with respect to sequence polymorphisms and evolutionary 
progression of subfamilies. A complete sequence alignment of 
the amino acids changed for ORF2 PAS is shown in supple- 
mentary figure SI, Supplementary Material online. We used 
similar precautionary measures but identified no amino acids 
to modify for the ORF1 PAS nor the ORF1 and ORF2 of the 
L1PA4 consensus sequence. Our wild-type LI PAS construct 
also contains these 11 modified amino acids. We assembled 
the LI and Alu sequences into tagged and/or untagged con- 
structs (fig. 2) to evaluate cis- and trans-mobilization in cul- 
tured HeLa cells. 
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PA6 CIQETHT.TC 
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Fig. 3. Revision of the L1PA8 sequence. Example of the approach used 
in the identification of L1PA8 consensus codon sequences conforming 
to the criteria for modification. The top panel shows an alignment of 
the amino acid sequences positions 40-48 of the published consensus 
sequences L1PA subfamilies (Khan et al. 2006) that allowed for the 
identification of methionine (M) at codon 47 of the ORF2 protein 
(circled) to be a potential L1 PA8-speciflc change. The bottom panel 
shows a nucleotide sequence alignment of ORF2 protein codon 47 
(flanking sequence is represented by dots) from our subset of full-length 
L1 PA8 copies. The sequences are highly variable due to the presence of a 
CpG (C to T or G to A). The original L1PA8 consensus had a methionine 
at this position due to a CpG correction error. However, the alignment 
of our 13 L1PA8 copies supports the threonine codon as the most likely 
to have been present in the active L1 PA8 element. The presence of a 
threonine is further supported by the observation that the other L1PA 
subfamilies in the time periods flanking L1PA8 also contain a threonine 
(T) at this position. Using these criteria, we corrected codon 47 of ORF2 
of the L1PA8 consensus. 

Evaluation of the Reconstructed Lis 
The reconstructed full-length L1PA4 and LI PAS elements 
proved to be retrocompetent in HeLa cells (fig. 4A). Our 
optimized version of the L1PA1 element has previously 
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been shown to be highly retroconnpetent and nnore active 
than wild-type LI in cultured cells (Wagstaff et al. 2011). The 
optimized LI PAS element shows a slightly higher retrotran- 
sposition efficiency relative to L1PA1 (^^125%, paired t-test 
P < 0.001). As with previous comparisons between optimized 
and wild-type LI elements, the optimized version of LI PAS is 
more active in this assay than its wild-type counterpart 
(paired t-test P< 0.001). Considering that the L1PA1 is the 
optimized version of the most active human LI reported, the 
L1RP, this indicates that both our optimized L1PA4 and 
LI PAS constructs are highly efficient. 

We performed two separate controls to confirm that 
the colonies from the L1PA4 and LI PAS transfections 
represented genuine retrotransposition events. First, we 
harvested HeLa DNA from colony pools and showed by 
PCR analysis that L1PA4 and LI PAS inserts contain the 
resistance tag with the intron spliced out (fig. 4B, top 
panel). Because splicing only occurs in transcripts gener- 
ated by the CMV promoter of our tagged LI constructs, 
this confirms that the antibiotic resistance is not due to 
protein expression from unincorporated plasmid in trans- 
fected cells. We further show that colony formation does 
not occur in the presence of the RT inhibitor, d4t (fig. 4B, 
bottom panel), which has previously been shown to 
effectively inhibit LI retrotransposition in HeLa cells 
(Kroutter et al. 2009). 

The codon-optimized neomycin LI -tagged constructs gen- 
erated equivalent amounts of spliced full-length LI transcripts 
(fig. 4C). As expected, the wild-type constructs (PASwt and 
LI.Bwt) have lower transcription levels than the optimized 
versions. Although there is approximately a 30-fold difference 
in the amount of transcript generated between the codon- 
optimized and the wild-type constructs, retrotransposition 
rates only differ by ^lA fold for LI PAS and ^--2.2 fold for 
L1PA1, indicating a nonlinear relationship between the 
amount of LI RNA and insertional capability, as has previ- 
ously been observed (An et al. 2011). 

Transmobilization of Old and Young Alu Subfamilies 
We generated a set of tagged Alu constructs comprising the 
consensus sequences of the young currently active subfami- 
lies (Alu Ya5 and Alu Y), an intermediate (Alu Sgl, previously 
known as Alu "AS" [Shen et al. 1991; Batzer et al. 1996]), and 
two older subfamilies (Alu Sx and Alu Jo). Expression analysis 
of the Alu constructs demonstrates equivalent expression 
between all the tagged Alu subfamily transcripts (fig. 5A). 
We also verified that the RNA and protein (ORFIp) expres- 
sion levels of the driver Lis were equivalent for the vectors of 
the three different LI subfamilies (fig. 5B). We next evaluated 
these modern and ancestral retroelement constructs to test 
for variation in Alu retrotransposition efficiency when driven 
by the different LI subfamilies in culture. Because Alu only 
requires ORF2p for retrotransposition (Dewannieux et al. 
2003; Wallace et al. 200S), we first chose to evaluate the 
effect of L1PA1, L1PA4, and LI PAS ORF2p on Alu subfamily 
activity (fig. 5C). Under these conditions, our negative con- 
trols showed no background (G41S resistant colonies) when 



the Alu construct was not supplemented with ORF2p (sup- 
plementary figs. S3 and S4, Supplementary Material online). 
The younger Alu elements consistently showed higher retro- 
transposition efficiency than the older Alu Jo when driven by 
the ORF2p of the younger Lis (PA1 and PA4; P< 0.001). 
However, there are no significant differences in Alu subfamily 
activity when the ORF2p of LI PAS drives retrotransposition. 
Instead, retrotransposition efficiency of the younger Alu elem- 
ents decreases to levels comparable to Alu Jo (supplementary 
fig. S3A, Supplementary Material online). These results are 
consistently observed even when varying transfection condi- 
tions by using different Alu/ORF2 ratios (supplementary 
fig. S4, Supplementary Material online). Performing the Alu 
subfamily retrotransposition analysis using full-length opti- 
mized LI elements to drive retrotransposition showed similar 
results (fig. 5D) but with a lower retrotransposition efficiency 
(supplementary fig. S3B, Supplementary Material online). 
Under these conditions, the difference in retrotransposition 
efficiency between Alu Jo and the younger Alu subfamilies 
was only observed with L1PA1. Although the Alu Sgl 
(^^25-35 Ma) shows a trend for a higher retrotransposition 
rate relative to the other Alu subfamilies, due to the intrinsic 
experimental variability, it is not significantly different 
(P = 0.3S5). 

Discussion 

Our data demonstrate that the use of consensus LI 
sequences is a viable approach for the reconstruction of 
extinct LI subfamilies. However, our initial failure to produce 
a retrocompetent LI PAS ORF2 sequence demonstrated the 
limitations to the approach, particularly for older subfami- 
lies. The primary stumbling block is the reliability of the data 
used to derive the consensus sequence. In particular, the 
nucleotide changes caused by the deamination of methy- 
lated CpGs present in the sequences used to build the 
consensus require careful attention. In the case of LI PAS 
ORF2, 4 out of the 1 1 identified amino acid changes could 
be attributed to CpG derived sequence changes. The linear 
progression of LI subfamilies provides an additional layer for 
the analysis of LI consensus sequences. By comparing tem- 
porally adjacent subfamilies (i.e., closely related), amino acid 
substitutions that appear as singletons (not present in an- 
cestral or derived subfamilies) can be closely scrutinized 
to make sure CpG or polymorphism correction errors do 
not occur. 

The insertional history of LI and Alu in primate genomes 
consists of a linear progression of subfamilies, with only brief 
temporal overlaps between ancestral subfamilies and the 
derived subfamilies that replace them. Previous phylogenetic 
and genetic distance analyses of ancestral LINEs and SINEs 
(Shen et al. 1991; Ohshima et al. 2003; Khan et al. 2006; 
Bennett et al. 200S) have shown that insertion rates vary 
over time, with some subfamilies reaching much higher 
copy numbers than others. There is no indication of a posi- 
tive correlation for insertion rate between LINEs and SINEs 
across evolutionary time, suggesting that if there were leni- 
ent and restrictive insertional time periods, those periods 
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Fig. 4. The reconstructed L1PA4 and L1PA8 are retrotransposition competent. (A) Relative retrotransposition efficiencies of reconstructed L1 elements 
in HeLa cells. The retrotransposition capability of the individual tagged L1 constructs: codon-optimized L1PA1 (PA1), L1PA4 (PA4), and L1PA8 (PAS) 
and the wild-type L1PA8 (PASwt) and JM101/L1.3 (L1.3 wt) are shown. Columns represent the G418'^ colony means normalized relative to the L1PA1 
with the standard deviation shown as error bars. The mean number of 0418*^ before normalization for the L1PA1 is shown above the column. Results 
from Student's paired t-test are indicated (n > 4). The panel on the right shows the mean ± standard error of the mean (SEM) of G418 resistant colonies 
observed for each L1 construct evaluated (n > 4). (B) Verification of L1PA4 and L1PA8 retrotransposition events. Top panel shows the PCR analysis of 
HeLa cells that were transfected with the tagged L1 PA1, L1 PA4, and L1 PA8 vectors. PCR analysis was performed using primers designed to anneal to the 
sequence flanking the intron disrupting the neomycin gene (neo). Plasmid DNA from each L1 construct was used as unspliced control (left). 
The annealing locations of the primers are shown in the schematic of the neo cassette plus intron (1,233 bp) and without intron (330 bp). DNA 
from the tagged L1 plasmids: L1 PA1 (1 ), L1 PA4 (4), and L1 PA8 (8) was used as control for the unspliced cassette. Results from DNA extracts from pooled 
0418^^ colonies generated by the indicated L1 constructs are shown as "Inserts." ST, size standard lanes are indicated. The lower panel shows a 
representative L1PA4 and L1PA8 retrotransposition experiment in the presence or absence of the RT inhibitor d4t. (C) Evaluation of the RNA profiles of 
the reconstructed L1 constructs. HeLa cells were transiently transfected with the tagged optimized L1PA1ch (PA1), L1PA4ch (PA4), L1PA8ch (PA8), and 
"wild type" L1PA8wt (PA8wt) and L1.3 construct JM101/L1.3 (L1.3wt). Poly-A selected RNA was hybridized with a strand-specific riboprobe to the 
neomycin resistance gene or to beta-actin (bottom panel indicated by C). The full-length unspliced tagged L1 transcript (arrow) and the full-length 
transcript with spliced neo tag (arrowhead) are indicated. The top panel shows a longer exposure of the blot. The dotted box highlights the location of 
the weak signal of the L1 PA8 wild-type transcripts for easier visualization. The faster migrating bands are likely common splice transcript variants 
previously shown to be generated by L1 elements (Belancio et al. 2006; Wagstaff et al. 2011). The spliced full-length LI transcript was normalized to beta- 
actin and calculated relative to the L1PA1 construct (designated as 1.0). The mean ± SEM for the quantification results for each construct is indicated 
below (n = 3). No significant differences were observed between the optimized L1PA1, L1PA4, and L1PA8 spliced transcripts (one-way analysis of 
variance, F = 0.80, P = 0.923604; n = 5). RNA levels for L1PA8wt ^nd L1.3 are significantly lower than their optimized LI counterparts (paired t-test, 
P < 0.0001; n = 3). 
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Fig. 5. L1 PA4 and L1 PAS support retrotransposition of ancestral Alu subfamilies. (A) Evaluation of the RNA profiles of the different tagged Alu 
subfamily constructs. Northern blot analysis of poly-A selected RNA extracts was performed from HeLa cells transiently transfected with the tagged 
constructs of five different Alu subfamilies that were active during distinct evolutionary periods. The unspliced (arrow) and spliced (arrowhead) 
neo-tagged Alu transcripts are indicated. The spliced Alu transcripts were normalized to P-actin (C, loading control) and expressed relative to the 
AluYaS that was arbitrarily designated as 1.0. The mean ± standard error of the mean (SEM) for the quantitation results for each construct are indicated 
below (n = 3). No significant differences were observed between the Alu subfamily transcripts (one-way analysis of variance, F= 0.46, P = 0.763722; n = 3). 
(B) Evaluation of RNA expression from the ORF1 or ORF2 constructs. Poly-A selected RNA and protein levels were evaluated from HeLa cells transiently 
transfected with the codon -optimized ORF1 and ORF2 protein expression vectors from the different L1 subfamilies (PA1, PA4, and PAS). RNA blots 
were hybridized with a riboprobe complementary to the 3'-region of the ORF1/ORF2 transcript indicated by the arrow and a riboprobe complementary 
to P-actin mRNA as control (C). Extracts were obtained from HeLa cells transiently transfected with the codon-optimized myc-tagged L1 ORF1 protein 
expression vector from the different subfamilies (PA1, PA4, and PAS). Protein blots were incubated with anti-myc indicated by the arrow and anti-P 
actin as control (C). (C) Retrotransposition of the tagged consensus Alu subfamilies (Ya5, Yb9, Sg1, Sx, and Jo) driven by the ORF2 protein of the 
different L1 subfamilies (L1PA1, L1PA4, and L1PAS). The Alu Ya5 data were used to define 100%. The mean number of G4is'^ before normalization for 
the AluYaS is shown above the column. Columns and error bars represent the % mean G4is'^ colonies ± SEM. P values indicate that the retro- 
transposition efficiency of the older Alu element (Alujo) is significantly lower than the modern AluYaS when the ORF2 driver is from L1PA1 or LI PAS 
(paired t-test, P< 0.001). (D) Retrotransposition of the tagged consensus Alu subfamilies (YaS, Y, Sgl, Sx, and Jo) driven by the full-length LI of the 
different subfamilies (L1PA1, L1PA4, and LI PAS). The mean number of G4is'^ before normalization for the AluYaS is shown above the column. 
Columns and error bars formatted as in C. P values indicate that the retrotransposition efficiency of the older Alu element (AluJo) is significantly lower 
than the modern AluYaS when L1PA1 is the driver element (paired t-test, P = 0.037; n > 3). 



were not the same for LI and Alu. Instead, the historical 
amplification patterns of LI and Alu suggest a possible 
negative relationship, with LI showing a relatively high in- 
sertion rate that only decreases with the emergence and 
proliferation of Alu (fig. 1). Peak Alu amplification also co- 
incides with peak formation of processed pseudogenes 
(Ohshima et al. 2003). This may indicate a period of general 
genomic leniency for new genomic inserts, except that the 
corresponding LI insertion rate is comparatively low. 
Alternatively, one or more of the active LI subfamilies 
from this period may have been especially vulnerable to 
nonautonomous elements. The period corresponding to 



the more recent expansion of Alu Y is interesting for a 
couple of reasons. Peak Alu Y amplification (fig. 1) corres- 
ponds with the emergence and proliferation of the 
nonautonomous SVA retroelement 18-25 Ma (Wang 
et al. 2005) and the rapid evolution of ORF1p and ORF2p 
during the transition from L1PA5 to L1PA4 (table 1). 
Whether both L1 proteins evolved in response to Alu 
and/or SVA competition, host factors, or other evolutionary 
pressures remains to be determined. 

There is a slight indication of differential interaction be- 
tween younger L1 elements and the different Alu subfamilies. 
However, the small observed difference between modern and 
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ancestral LI elements is less likely, on its own, to explain the 
changing insertional dynamics of Alu amplification. Other 
explanations to the evolutionary pattern of Alu amplification 
exist. The Alu "master" or "source" element model suggests 
the existence of a small number of hyperactive source elem- 
ents that are responsible for the accumulation of the new Alu 
copies (Deininger et al. 1992). Stochastic changes in the 
number of source elements during any given time period 
could be a factor in determining Alu amplification patterns. 
In addition, Alu amplification dynamics may have been sig- 
nificantly influenced by "stealth-driver" elements (Han et al. 
2005), with the appearance of short-lived hyperactive copies 
regulating Alu amplification dynamics. This pattern is appar- 
ent in the analysis of the Orangutan genome. The low 
number of Orangutan-specific Alu insertions may be because 
of low "stealth" Alu amplification (Walker et al. 2012) in a 
genome lacking short-lived hyperactive Alu copies. Thus, the 
combination of population dynamics and stochastic variation 
in active Alu elements has likely played a role in Alu subfamily 
proliferation and evolution. 

A limitation to the investigation of ancestral LINE and SINE 
elements is the inability to replicate the exact cellular envir- 
onments that existed during their proliferation. Any inter- 
actions between LINEs and SINEs are likely to be mediated 
by cellular factors and those interactions could well be lost in 
living tissues and immortalized cell lines. Multiple 
studies show that endogenous retroelement activity can 
be regulated by cellular factors (reviewed in Levin and 
Moran 2011). Examples include, the human APOBEC3 
family of cytidine deaminases (Bogerd et al. 2006), the 
MOV10 superfamily 1 putative RNA helicase (Arjan-Odedra 
et al. 2012), the 3'-repair exonuclease 1, TREX1 (Stetson et al. 
2008), and "flap" endonuclease XPF/ERCC1 heterodimer 
(Gasior et al. 2008). In addition, different interfering RNA- 
based mechanisms, including siRNAs and piRNAs, have 
been shown to inhibit mobile elements (reviewed in Levin 
and Moran 2011). Because of the possibility for coevolution 
with parasitic mobile elements, host factors may evolve rap- 
idly, leading to changing cellular environments. For example, 
antagonistic interactions between primates and their retro- 
viruses or retroelements can lead to rapid evolution of host 
factors to limit their proliferation. Several recent studies have 
shown that APOBEC genes have evolved rapidly in human 
ancestors and differentially regulate retrovirus and/or retro- 
element activity in primates (OhAinle et al. 2006; Stenglein 
and Harris 2006; Niewiadomska et al. 2007; Tan et al. 2009; 
Duggal et al. 2011). These interactions can lead to a state of 
perpetual coevolution between cellular factors and patho- 
gens. Whether APOBEC genes directly target retroelements 
or affect them indirectly because of their interaction with 
retroviruses is undetermined. Although Alu requires LI pro- 
teins to retrotranspose, there are examples of some factors 
that differentially affect LI and SINE mobilization (Hulme 
et al. 2007; Kroutter et al. 2009; Ichiyanagi et al. 2011). Our 
inability to measure any major differential interactions be- 
tween ancestral LINE and SINE subfamilies could simply be 
because the mediating cellular factors are no longer active in 
modern humans. 



Either way, the historical activity of LINEs and SINEs has 
likely been influenced by host factors that evolve to combat 
changing cellular threats and stochastic events that affect the 
number of active elements at any given period. We are 
currently evaluating the influence of cellular factors on LINE 
and/or SINE subfamily activity. 

Supplementary Material 

Supplementary figures S1-S4 are available at Molecular 
Biology and Evolution online (http://www.mbe.oxfordjour- 
nals.org/). 
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